This notebook is provided as a guide before creating interactive plotting in Shiny Web App. In this notebook, I will use the New York City Airbnb Open Data obtained from Kaggle. This dataset describes the listing activity and metrics in New York City in 2019. The goal is to create an interactive Shiny dashboard. All the necessary operations such as data cleaning and initial visualization will first be performed in this notebook.
These following packages are required in this notebook. Use install.packages() to install any packages that are not already downloaded and load them using library() function. I provided a brief explanation about their function.
Packages for Shiny App:
library(tidyverse)
library(glue)
library(scales)
library(plotly)
library(lubridate)
library(leaflet)ab_nyc <- read.csv("data_input/AB_NYC_2019.csv")
head(ab_nyc)## id name host_id host_name
## 1 2539 Clean & quiet apt home by the park 2787 John
## 2 2595 Skylit Midtown Castle 2845 Jennifer
## 3 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 Elisabeth
## 4 3831 Cozy Entire Floor of Brownstone 4869 LisaRoxanne
## 5 5022 Entire Apt: Spacious Studio/Loft by central park 7192 Laura
## 6 5099 Large Cozy 1 BR Apartment In Midtown East 7322 Chris
## neighbourhood_group neighbourhood latitude longitude room_type price
## 1 Brooklyn Kensington 40.64749 -73.97237 Private room 149
## 2 Manhattan Midtown 40.75362 -73.98377 Entire home/apt 225
## 3 Manhattan Harlem 40.80902 -73.94190 Private room 150
## 4 Brooklyn Clinton Hill 40.68514 -73.95976 Entire home/apt 89
## 5 Manhattan East Harlem 40.79851 -73.94399 Entire home/apt 80
## 6 Manhattan Murray Hill 40.74767 -73.97500 Entire home/apt 200
## minimum_nights number_of_reviews last_review reviews_per_month
## 1 1 9 2018-10-19 0.21
## 2 1 45 2019-05-21 0.38
## 3 3 0 NA
## 4 1 270 2019-07-05 4.64
## 5 10 9 2018-11-19 0.10
## 6 3 74 2019-06-22 0.59
## calculated_host_listings_count availability_365
## 1 6 365
## 2 2 355
## 3 1 365
## 4 1 194
## 5 1 0
## 6 1 129
str(ab_nyc)## 'data.frame': 48895 obs. of 16 variables:
## $ id : int 2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
## $ name : chr "Clean & quiet apt home by the park" "Skylit Midtown Castle" "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" ...
## $ host_id : int 2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
## $ host_name : chr "John" "Jennifer" "Elisabeth" "LisaRoxanne" ...
## $ neighbourhood_group : chr "Brooklyn" "Manhattan" "Manhattan" "Brooklyn" ...
## $ neighbourhood : chr "Kensington" "Midtown" "Harlem" "Clinton Hill" ...
## $ latitude : num 40.6 40.8 40.8 40.7 40.8 ...
## $ longitude : num -74 -74 -73.9 -74 -73.9 ...
## $ room_type : chr "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
## $ price : int 149 225 150 89 80 200 60 79 79 150 ...
## $ minimum_nights : int 1 1 3 1 10 3 45 2 2 1 ...
## $ number_of_reviews : int 9 45 0 270 9 74 49 430 118 160 ...
## $ last_review : chr "2018-10-19" "2019-05-21" "" "2019-07-05" ...
## $ reviews_per_month : num 0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
## $ calculated_host_listings_count: int 6 2 1 1 1 1 1 1 1 4 ...
## $ availability_365 : int 365 355 365 194 0 129 0 220 0 188 ...
Some information about the features:
By inspecting the data, we know that some of the features’ types are incorrect and there are some missing values in the data. Furthermore, we know that we do not need the IDs in data visualization. So, I am going to drop id and host_id beforehand.
ab_nyc <- ab_nyc %>%
select(-c(id, host_id))The term neighbourhood_group seems confusing. So, I will change it to borough instead.
ab_nyc <- ab_nyc %>%
rename(borough = neighbourhood_group)Notice that in room_type, there is a value called Entire home/apt. Later on, when we are providing text for the interactive plot, this value will be displayed. I will change it first into a more preferable format.
unique(ab_nyc$room_type)## [1] "Private room" "Entire home/apt" "Shared room"
ab_nyc <- ab_nyc %>%
mutate(room_type = recode(room_type,
"Entire home/apt" = "Entire Home/Apartment",
"Private room" = "Private Room",
"Shared room" = "Shared Room"))We need to convert these following features’ types: * borough: Categorical * neighbourhood: Categorical * room_type: categorical * last_review: Date
ab_nyc <- ab_nyc %>%
mutate(across(c(borough, neighbourhood, room_type),
factor),
last_review = ymd(last_review))
head(ab_nyc)## name host_name borough
## 1 Clean & quiet apt home by the park John Brooklyn
## 2 Skylit Midtown Castle Jennifer Manhattan
## 3 THE VILLAGE OF HARLEM....NEW YORK ! Elisabeth Manhattan
## 4 Cozy Entire Floor of Brownstone LisaRoxanne Brooklyn
## 5 Entire Apt: Spacious Studio/Loft by central park Laura Manhattan
## 6 Large Cozy 1 BR Apartment In Midtown East Chris Manhattan
## neighbourhood latitude longitude room_type price minimum_nights
## 1 Kensington 40.64749 -73.97237 Private Room 149 1
## 2 Midtown 40.75362 -73.98377 Entire Home/Apartment 225 1
## 3 Harlem 40.80902 -73.94190 Private Room 150 3
## 4 Clinton Hill 40.68514 -73.95976 Entire Home/Apartment 89 1
## 5 East Harlem 40.79851 -73.94399 Entire Home/Apartment 80 10
## 6 Murray Hill 40.74767 -73.97500 Entire Home/Apartment 200 3
## number_of_reviews last_review reviews_per_month
## 1 9 2018-10-19 0.21
## 2 45 2019-05-21 0.38
## 3 0 <NA> NA
## 4 270 2019-07-05 4.64
## 5 9 2018-11-19 0.10
## 6 74 2019-06-22 0.59
## calculated_host_listings_count availability_365
## 1 6 365
## 2 2 355
## 3 1 365
## 4 1 194
## 5 1 0
## 6 1 129
colSums(is.na(ab_nyc))## name host_name
## 0 0
## borough neighbourhood
## 0 0
## latitude longitude
## 0 0
## room_type price
## 0 0
## minimum_nights number_of_reviews
## 0 0
## last_review reviews_per_month
## 10052 10052
## calculated_host_listings_count availability_365
## 0 0
There are 10052 missing values both in last_review and reviews_per_month. Considering the information, it seems like we are unable to impute the missing values. Besides, giving it further thought, I do not think that those features are very important in interactive plotting. So, I am going to just drop those features.
ab_nyc <- ab_nyc %>%
select(-c(last_review, reviews_per_month))sum(duplicated(ab_nyc))## [1] 0
There is not any duplicated data in the dataset. So we can proceed to the visualization part. Before that, I combined the codes for cleaning the data as follows:
#ab_nyc <- read.csv("data_input/AB_NYC_2019.csv")
#ab_nyc <- ab_nyc %>%
# select(-c(id, host_id, last_review, reviews_per_month)) %>%
# rename(borough = neighbourhood_group) %>%
#
# mutate(across(c(borough, neighbourhood, room_type),
# factor)) %>%
#
# mutate(room_type = recode(room_type,
# "Entire home/apt" = "Entire Home/Apartment",
# "Private room" = "Private Room",
# "Shared room" = "Shared Room"))Below are the features I want to add in Shiny dashboard:
When we are going to show the top-n listings, we need a metric that allows us to be able to rank them. However, in the dataset, there are no things such as review score. The only metric we can use is only number_of_reviews, which I personally think may be appropriate to use since more reviews simply means the place is more popular. It does not guarantee that the place is the best option though (some reviews might be bad), but since there are no review score, let’s just proceed with the number of reviews for now.
Although I am going to create an interactive bar plot that can change based on users’ input, I will only create a single plot here (as the base). Then, when creating the Shiny dashboard, I will change some of the mappings in the plot so that it can receive users’ input. For now, I will create a bar plot that shows top 5 private room listing under $250 in Brooklyn and Manhattan.
bar_df <- ab_nyc %>%
filter(borough %in% c("Brooklyn", "Manhattan"),
room_type == "Private Room",
price <= 250) %>%
slice_max(number_of_reviews, n = 5)
bar_df## name host_name borough neighbourhood
## 1 Great Bedroom in Manhattan Jj Manhattan Harlem
## 2 Beautiful Bedroom in Manhattan Jj Manhattan Harlem
## 3 Private Bedroom in Manhattan Jj Manhattan Harlem
## 4 Manhattan Lux Loft.Like.Love.Lots.Look ! Carol Manhattan Lower East Side
## 5 LG Private Room/Family Friendly Wanda Brooklyn Bushwick
## latitude longitude room_type price minimum_nights number_of_reviews
## 1 40.82085 -73.94025 Private Room 49 1 607
## 2 40.82124 -73.93838 Private Room 49 1 597
## 3 40.82264 -73.94041 Private Room 49 1 594
## 4 40.71921 -73.99116 Private Room 99 2 540
## 5 40.70283 -73.92131 Private Room 60 3 480
## calculated_host_listings_count availability_365
## 1 3 293
## 2 3 342
## 3 3 339
## 4 1 179
## 5 1 0
bar_plot <- bar_df %>%
ggplot(mapping = aes(x = reorder(name, number_of_reviews),
y = number_of_reviews,
text = glue("{name}
Location: {neighbourhood}, {borough}
Price: ${price}
Reviews Count: {number_of_reviews}"))) +
geom_col(fill = "#2c3e50") +
geom_text(aes(label = number_of_reviews,
y = number_of_reviews + 12),
size = 3,
col = "black") +
labs(title = glue("Top 5 Private Room Listing under $250 in Brooklyn and Manhattan"),
x = NULL,
y = "Number of Reviews") +
scale_x_discrete(labels = wrap_format(20)) +
coord_flip() +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5))ggplotly(bar_plot, tooltip = "text") %>%
layout(hoverlabel = list(bgcolor = "b5e2ff"))Creating the icons and popup content.
bnb_icon <- makeIcon(
iconUrl = "assets/home.png",
iconWidth = 30,
iconHeight = 30
)
popup <- paste(sep = "",
ab_nyc$name, "<br>",
"Room Type: ", ab_nyc$room_type, "<br>",
"Price: $", ab_nyc$price,"<br>",
"Number of Reviews: ", ab_nyc$number_of_reviews
)Creating the map, limiting the zoom out options so the map can be more focused on New York City.
bubble_map <- leaflet(options = leafletOptions(zoomControl = FALSE,
minZoom = 10)) %>%
setView(lng = -73.935242, lat = 40.730610, zoom = 10) %>%
addTiles() %>%
addMarkers(lat = ab_nyc$latitude,
lng = ab_nyc$longitude,
icon = bnb_icon,
popup = popup,
clusterOptions = markerClusterOptions()
) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addProviderTiles(providers$Stamen.TonerLines,
options = providerTileOptions(opacity = 0.5)) %>%
addProviderTiles(providers$Stamen.TonerLabels) %>%
addProviderTiles(providers$OpenSeaMap)
bubble_map